Tech Arsenal 1

home *** CD-ROM | disk | FTP | other *** search

/ Tech Arsenal 1 / Tech Arsenal (Arsenal Computer).ISO / tek-04 / love4th.zip / SEGMENT.DOC < prev next >

Wrap

Text File | 1991-10-01 | 15KB | 333 lines

LOVE Forth addressing and segmentation -------------------------------------- Almost all languages have problems running on the 8086/88, but these problems for FORTH are especially severe. Most FORTH systems on this architecture are restricted to 64K of main memory for program and data, and are referred to as small memory models. This restricts the user and programs to a small amount of memory, but offers the highest possible execution speed. 32 bit FORTHS have been produced that offer a large address space, but performance has been severely degraded. The segmentation approach taken in LOVE FORTH offers both a large memory size ( 320K ) and very fast execution speed. Rather than offer a large contiguous memory space, LOVE Forth has divided up the forth model by function. There are separate segments for machine code, threaded addresses, data, stacks and dictionary headers. As the source code is compiled it is parcelled into these five segments. There is no execution time penalty over Forths with the small memory model. Note that this implementation is quite compatible with standard 16 bit models. For example, @ (fetch) and ! (store) access the data segment (the vast majority of FORTH programs use @ and ! to access data). Another example is that the assembler always puts its code into the code segment. The programmer need not worry that the code has been separated from the rest of the program. Even though segmentation is provided for in a logical fashion, some compiler words must be implemented differently than in standard FORTH. There are numerous indirect benefits to this segmentation, over and above that of memory conservation. Target systems can easily be saved without heads (the head segment is simply not written to disk). The segments can be compressed to provide small target systems. And because machine code is separated from threads, it is actually possible to save space in the thread segment by re-coding some words in machine code. (The thread segment always fills the fastest). This gives simultaneously a speed and size advantage. Note that this conforms closely to the intended usage of the architecture of 8086/88 microprocessors. The ususal programming battle with these processors is to overcome this limited architecture. Here is a summary of the contents of each segment: Segment Description Name ======= =========== ==== CODE Contains 8086 machine code CS: pointed to by CS register THREAD Contains threaded address lists generated by TS: high level words. The code field address points here. pointed to by DS register DATA holds data from variables, alphanumeric strings, VS: and block buffers. pointed to by ES register HEAD holds the compile-time word headers, and HS: vocabulary links. (segment value calculated when req'd) STACK holds the parameter, return and vocabulary SS: stacks and local variables, if used. pointed to by SS register Each segment has a corresponding dictionary pointer, and a set of basic manipulation words such as CS:@ or HS:, . Note that all the addresses within these segments are 16 bits. The programmer must specify the segment to be operated upon by the type of operator used (eg. @, TS:@, CS:@ etc.) As MS-DOS tends to vary the position in RAM at which a program is loaded, each segment also has a word to return the actual position of the segment (GET:CS, GET:SS etc.). The handy command MEM-MAP displays all the segments, and their respective dictionary pointers. In this documentation and elsewhere, addresses are abbreviated. For example TS:addr represents an address in the thread segment. Simply 'addr' refers to the the variable segment (most often used). Some names assume a segment, for example 'compilation address' is always in the thread segment, name field address is always in the head segment. CODE SEGMENT ------------ This is the only segment that contains 8086/8088 machine code. Apart from the space taken by a few pointers used in CREATE DOES> words, this allows code to reach a full 64K. The assembler places the definition body into the code segment automatically. This is always the lowest of the 5 segments. Startup code in this segment, sets up the other segments. This segment contains the MS-DOS "PSP" (program segment prefix) in the first 256 bytes, in version 1.28 and prior ones. Use !!GET:PSP! in newer versions. Basic operators: CS:C@ CS:@ CS:! CS:C! CS:, CS:C, CS:HERE These are analogous to the standard words: C@ @ ! C! , C, and HERE, but operate on the code segment. 'CODE operates like ' but returns the address of the executed code extracted from the compilation address. For example all : words return the same value from 'CODE because they all call the common code for nesting colon definitions. It is thus most useful with CODE words, where it returns the address of the code loaded by the assembler. CS:DUMP is a utility that allows bytes to be dumped from this segment. ( CS:addr, #bytes -- ) There are also some system 'variables' which are used, for example, at start-up before all the segments have been loaded or properly positioned. TOPSEG STACKSIZE TOPSEG SEGPAK LOVEF CSEG TSEG VSEG HSEG SSEG The current segment (position in RAM) is returned by: GET:CS (8086 CS register) THREAD SEGMENT -------------- Forth high-level (:) words are compiled into a sequence of 16 bit addresses, called threads. This segment contains these threads, CONSTANT and LITERAL values, and pointers to data and code. In the majority of applications this segment fills up the fastest. Basic operators: TS:@ TS:! TS:, TS:HERE Note that there are no single byte operators - all elements in this segment are two bytes. EXECUTE ( TS:addr -- ) Accepts the code field address. TS:DUMP ( TS:addr, #bytes -- ) Dumps bytes from the specified address. Many words with compile-time usage accept or return addresses in this segment: ' ['] -FIND ( -- TS:addr ) FIND ( VS:addr -- VS:addr, 0 or TS:addr, n ) Words created with the following return a thread segment address at run-time: CREATE: (alone) or CREATE: DOES:> ( pair) The most often used words for creation are CREATE and CREATE DOES> (pair). See the Variable segment (below). In addition the following words add to this segment and have functions as expected: COMPILE [COMPILE] wordname LITERAL DLITERAL See also the technical note on L.O.V.E. Forth compatibility for examples of compile-time word usage. TS:BODY> TS:>BODY ( TS: addr -- TS: addr ) are like >BODY and BODY> but operate on the thread segment only. (see discussion of 'Field access operators' below) >BODY ( TS:addr -- VS:addr ) operates in LOVE Forth to accept a code field address of a VARIABLE (or word created by CREATE) and return the data field address. >LINK >NAME ( TS:addr -- HS:addr ) are used to access the dictionary header of the specified word. If TS:addr is not a valid code field address, an error message is displayed. NAME> LINK> ( HS:addr -- TS:addr ) are used to find the compilation address from the head address. FIND-1VOC FIND-VOCS ( addr, addr -- TS:addr,true or false) are used by FIND - address of word to find (usually at HERE) and vocab body input and cfa output (if found). GET:TS - returns current segment value (8086 DS register) VARIABLE (DATA) SEGMENT ----------------------- This segment is accessed the most often by application programs. This contains the data for variables, alphanumeric strings compiled by ." and " , BLOCK buffers, text input buffer (TIB), PAD, HERE and where space is allocated for programmer defined data structures. Most standard Forth memory access words work relative to this segment. Basic operators: @ ! C@ C! C, , D@ D! +! +C! TYPE ALLOT TOGGLE BMOVE CMOVE CMOVE> FILL ENCLOSE EXPECT COUNT TYPE -TRAILING CONVERT NUMBER #> HERE PAD WORD BLOCK LIMIT FIRST BUFFER +BUF R/W Various I/O words: L->CRT N$ N$. 'STREAM TIB, HLD and other VARIABLEs all return addresses in VS: File name strings passed into DOS words: <OPEN> OPEN <CREATE> FCREATE INQUIRE <CREATE-NEW> CREATE-NEW DELETE RENAME CHDIR Other DOS words: READ WRITE ENV-SRCH DIR-GET ASCIIZ. ASCIIZ" -words created by VARIABLE DVARIABLE CREATE CREATE ... DOES> DUMP ( addr, #bytes-- ) Dumps the specified bytes. GET:VS - returns current segment value (8086 ES register) HEAD SEGMENT ------------ The head segment is normally used during compilation only. It contains the header part of a Forth word definition, including name, dictionary links and pointers to the locations of the word in other segements. This segment may be discarded when creating a stand-alone application program. Utilities such as WORDS and FORGET access this segment automatically. Basic operators: HS:@ HS:! HS:C@ HS:C! HS:, HS:C, TRAVERSE N>LINK L>NAME LINK> NAME> .ID LAST HS:HERE returns the next available address in this segment. GET:HS - returns current head segment value (calculated) Note TOGGLE does not act on HS: (often used to toggle header bits) Note: the form of the head segment is subject to change in future versions by the authors without prior notice. STACK SEGMENT ------------- This segment holds the Forth parameter, return, vocabulary and local variables stacks. The operation of words on this segment is transparent to the programmer. During development, allowing a full 64K to the stack segment means that system crashes due to stack overflow are minimized. Basic operators: SS:@ SS:! .S SP@ RP@ LP@ ( -- SS:addr ) These words return stack limits or current positions S0 is a variable that contains the address of the bottom of stack SS:HERE Is the dictonary pointer in this segment, but is currently unused by any words in L.O.V.E. Forth and may be used by the programmer if so desired. GET:SS - returns current segment value (8086 SS register) Field Access Operators ====================== Every word in Forth has a number of parts or fields. These include the name, link, code and parameter fields. Field access operators are used to gain access to the various portions of forth words. In L.O.V.E. Forth, as the parts of words are parcelled between segments, many of these operators accept an address in one segment and deliver an address in another. Here is a summary of the standard field access operators and their functions in LOVE Forth. >BODY ( TS:addr -- addr ) accepts a code field address of a VARIABLE (or word created by CREATE) and returns the data field address (in VS:) . TS:>BODY TS:BODY> ( TS: addr -- TS: addr ) are like >BODY and BODY> but operate on the thread segment only. Given the compilation address, TS:>BODY returns the address of the first threaded address (of a : definition), the data field of a CONSTANT, or the address pointer of a VARIABLE. Note that there are thus two types of >BODY. >BODY could be rewritten: : >BODY TS:>BODY TS:@ ; >LINK >NAME ( TS:addr -- HS:addr ) are used to access the dictionary header of the specified word. If TS:addr is not a valid compilation address, an error message is displayed and execution is ABORTed. NAME> LINK> ( HS:addr -- TS:addr ) are used to find the compilation address from the header addresses name and link fields. N>LINK L>NAME ( HS:addr -- HS:addr ) are used to move between the name and link fields which are both in the head segment. Note that there is no word BODY> to move from the VS: parameter address of a VARIABLE or CREATEd word to the compilation address. This is not supported in L.O.V.E. Forth. Long Operators ============== LOVE Forth contains a set of basic operators which operate on any area of memory. These words allow the specification of both the segment and address of the word to be operated upon. Basic operators: @L !L C@L C!L BMOVEL Some disk operators will operate on any segment: READL WRITEL <READL> <WRITEL> RWTSL EXEC ENV-SRCH ( string -- seg, addr, f or t ) returns both segment and address of DOS environment DUMPL ( seg,addr,#bytes -- ) Allows memory to be dumped relative to any segment. Memory map ---------- The dictionary pointers move up as more is compiled. Certain words only use certain segments (eg. a CONSTANT occupies only the thread and head segments). When any of the dictionary pointers reaches within 400 bytes of the maximum available address a warning message is displayed 'GETTING CLOSE TO FULL'. The maximum available address in each segment is dependent on several things. Virtual vocabularies are loaded in high memory, disk buffers are also here (in the VS: only - minimum of 2k bytes). The current maximum addresses are always stored in the VARIABLE TOPS (contains one cell for each of CS: TS: VS: and HS:). If the program is very large, it is best to remove any resident virtual vocabulary with FORGET-SYS.